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Abstract: PC-CUBE is an ensemble of IBM PCs or 
close compatibles connected in the hypercube topol- 
ogy with ordinary computer cables. Communication 
occurs at the rate of 115.2 K-baud via the RS-232 se- 
rial links. Available for PC-CUBE is the Crystalline 
Operating System III (CrOS III), Mercury Operating 
System, CUBIX and PLOTIX which are parallel I/O 
and graphics libraries. A CrOS performance moni- 
tor was developed to facilitate the measurement of 
communication and computation time of a program 
and their effects on performance. Also available are 
CXLISP, a parallel version of the XLISP interpieter; 
GRAFIX, some graphics routines for the EGA and 
CGA; and a general execution profiler for determining 
execution time spent by program subroutines. PC- 
CUBE provides a programming environment similar 
to all hypercubc systems running CrOS III, Mercury 
and Cubix. In addition, every node (personal com- 
puter) hsts its own graphics display monitor and stor- 
age devices. These allow data to be displayed or 
stored at every processor, which has much instruc- 
tional value and enables easier debugging of applica- 
tions. Some application programs which are taken 
from the book Solving Problems on Concurrent 
Processors [Fox 88] were implemented with graphics 
enhancement on PC-CUBE. The applications range 
from solving the Mandelbrot set, Laplace equation, 
wave equation, long range force interaction, to Wa- 
Tor, an ecological simulation. 


MASifS 


1. Introduction* 

Parallel computer systems promise to provide 
unprecedented high performance (in large configu- 
rations) and better price/performance than sequen- 
tial computers (in small configurations.) Commer- 
cial distributed-memory parallel systems have been 
available for a few years. Programming in a parallel 
environment is not difficult. However, it is also not 
as straight-forward as programming sequentially es- 
pecirdly for those of us who have learned and done 
so for many years. Programming parallel systems to 
perform concurrent computations requires new tech- 
niques that are best learned through hands-on expe- 
rience on real paraUel computers. 

High-performance parallel computers are not 
usually available in academic environment. It is also 
extravagant to use such a system to teach a class. An 
alternative is to build your own paraUel computer us- 
ing already existing microcomputers [Ho 88, 88bj like 
IBM personal computers in a microcomputer labo- 
ratory. This paper describes the use of PC*Cube 
parallel system, an IBM PC-based entry-level hyper- 
cube, as an instructional and developmental tool for 
parallel processing. 

2. PC-Cube, An Entry-Level Hypercube 

The main objective of the PC-Cube project is to 
develop a true hypercube system for use in a micro- 
computer laboratory as an instructional and devel- 
opmental tool for parallel processing. The PC-Cube 
package consists of: 

(1) Communication hardware requirement which 
enables the hypercube connection of IBM PC's 
or close compatible. 

(2) Software environment for the PC-based hyper- 
cube which allows applications to be written in 
CrOS III [Kolawa 86], Cubix [Salmon 86), or 
Mercury [Lee 86). 

(3) To illustrate the use of the PC-Cube package by 
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running some of the applications found in Solv- 
ing Problems on Concurrent Processors. 

PC'Cube is an ensemble of IBM PCs or com- 
patibles interconnected in the hypercubr topology 
The PCs can be thought of being located at the ver- 
tices of an imaginary hypcrcube and the edges of the 
multi-dimensional cube are replaced by ordinary ca- 
bles. The control processor! CP), another PC, is con- 
nected to Node 0 which is one of the node PCs. In- 
stead of using special communication channels as in 
the commercial hypercubes, PC-Cube uses inexpen- 
sive RS-232 serial ports. To exploit the maximum 
communication capacity of the ports low level rou- 
tines have been written to address the UART (Uni- 
versal Asynchronous Receiver /Transmitter) chip on 
the serial board. A very high baud rate, which is 
hardware limited, of 115.2 Kbaud for node to node 
data transmission is achieved. PC-Cube provides a 
system that is balanced between processing and com- 
munication speed. Applications that are written for 
PC-Cube will also run on larger and faster commer- 
cial hypercubes with little or no modification. 

3. Advantages of Using PC-Cube 

As an instructional tool PC-Cube has several 
advantages over the commercial hypcrcubcs. One 
major advantage is its ease of use. Nodes of com- 
mercial hypercubes are not equipped with I/O de- 
vices such as display monitors, yet each node of a 
PC-Cube, i.e., a PC, always has either a monochrome 
or a color display. The availability of a display device 
at each node allows users to see and to demonstrate 
how parallel algorithms work. The graphics display 
capability heis high educational value, as applications 
can dispUy their results in a more informative and 
descriptive manner. Viewing the actions of an appli- 
cation at the nodes also facilitates debugging because 
errors can be pinpointed to specific nodes. In addi- 
tion, each node has a keyboard, which makes it pos- 
sible to implement multi-user applications such as a 
multi-user expert systems. 

Although DOS has a 640 Kbyte memory bar- 
rier, users of PC-Cube employing parallel processing 
technique such as domain decomposition can perform 
computation on large data sets simply by decompos- 
ing them into smaller subsets and distributing over 
the system. Roughly speaking, a 2 Mbyte data set 
can be distributed among 4 node processors which 
makes the pet node memory requirement drop to 512 
Kbyte, or if 8 node processors arc used each node 
would only need 256 Kbyte memory. 

Another advantage that may not seem obvious 
is that PC-Cube is extremely easy to install or take 


apart This featiire of PC-Cube provides the r>ppftr- 
tunity for every user to have hands-on experience in 
setting up a hypercube. Understanding the physi 
cal connectivity of a hypercube and its relation with 
the control processor in the beginning should make it 
easier to implement communication strategies subse- 
quently 

Even when several PCs are physically connected 
as 1 hypercube, users can still use them as stand- 
alone computers to run their usual PC software. In 
addition, the maintenance cost of a PC-Cube is very 
low as compared to that of a commercial hypeicube. 

4. Hardware Requirements 

The hardware requirements for PC-Cube de- 
pend on the dimension of the hypercube to be set up. 
PC-Cubes of dimensions up to 3 have been tested In 
principle, however, PC-Cube can be made larger. 

PC-Cube hardware essentially consists of three 
components: 

(1) One IBM PC or compatible for each node in the 
hypercube and for the control processor (CP); 

(2) Standard RS-232 serial port(s) in each PC; 

(3) Ordinary cables with at least 7 wires to be used 
to connect the node PCs and the control pro- 
cessor. 

In general, if n is the number of nodes and d is 
the dimension of the hypercube, then 

n = 2^ 

and 

# of PCs = 1 f 2^ 

# of cables = 1 -I- n x | 

# of serial ports = 2 -h n x d 

Each node and the CP must be an IBM PC 
or close compatible, set up as described in the IBM 
Guide to Operations manual. Although a hard disk 
is not a required device for PC-Cube, it is recom- 
mended that the computer on which applications are 
to be developed (usally the CP) shall be equipped 
with one. The system software and utilities does not 
depend on the type of display monitoi or graphics 
adapter. However, some of the included demonstra- 
tion programs will display graphics at the CP and the 
node monitors. This means graphics monitors are re- 
quired for these graphics demonstration programs to 
work properly. 

Specifically, a PC-Cube of dimension d requires 
d serial ports in each node, except for Node 0 which 
requires d i ports. The CP always requires one 
serial port. 


Each serial port must be located at a unique 
address in the I O address space. When purchasing 
serial ports for PC-Cube, it is important to ensure 
that the I O addresses used by the serial ports do not 
conflict with each other (or with the I O addresses of 
any other hardware on the PC.) 

PC-Cuhe keeps a list of possible serial port ad* 
dresses. During initialiiation, PC-CrOS examines 
each one of these addresses in turn to determine 
whether or not there is a serial port located at that 
address. The first port that PC*CrOS finds becomes 
channel 0. The second port that it finds becomes 
channel 1, and so on. 

A PC-Cube cable is an ordinary cable with at 
least 7 wires and two 25*pin RS*232 connectors. The 
cable is used to tie the following pins of the two RS- 
232 connectors together as indicated in Fig. 1: 



Fig.l Cable Configuiatioa 


In addition, connect pins 8 and 22 to 7 of the same 
connector using short pieces of wire, i.e., short pins 8 
and 22 to ground (pin 7). 

5. Software Requirements 

For PC-Cube to run properly the minimum soft- 
ware required are as follows: 

(1) DOS version 2.0 or later (except for the execu* 
tion profiler which requires at least 3.0) 

(2) A C compiler. The PC-Cube software pack- 
age was developed with Microsoft C version 4.0. 
Most of the code in this package also works with 
Turbo C version 1.0. 

If a user knows how to use the Microsoft mixed 
language interface, the C library provided by this 


package can be accessed from MS-Fortran or MS- 
Pascal programs. 

6. The PC-Cubc Package 

Two hypercube communications systems have 
been ported from the Caltech. JPL Mark III hyper- 
cube to the PC-Cube environment: CrOS III and 
Mercury Operating System. Also ported arc the Cu- 
bix and Plotix parallel 10 and graphics library. 

PC-CrOS. PC-MOS. PC-Cubix and PC-Plotix 
are highly compatible with the original versions. A 
well-written hypetcube application can be potted 
among PC-Cube and other commercial hypcrcubes 
with only minimal modifications. In other words, 
programmers who use PC-Cube for code development 
can scale up the sise of the physical problem and run 
the same piece of code on other hypetcubes. 

PC-specific utilities include a build-in perfor- 
mance monitor for PC-CrOS, a general execution pro- 
filer, and a simple graphics library. Brief description 
of PC-Cube system software and utilities are given as 
follows (for detail descriptions of the software, please 
see the C^P documents referenced): 

PC-CrOS 

CtOS is a channel-based, point to point polled 
communication system. CrOS allows directly con- 
nected nodes of a hypercube to communicate with 
each other. Messages can be sent over long distance 
by using several hops but the programmer has to pro- 
vide the forwarding instructions explicitly. 

When a node is ready to send a message, that 
node it blocked until the receiver is ready. Similarly, 
if a node wants to receive a message, that node is 
blocked until the message is sent. Since the proces- 
sors cannot proceed until the read or write is done, 
the processors move in locksteps through their pro- 
gram. Bach read and write command resynchronises 
the processors. 

In order to achieve reliable communications PC- 
CrOS uses synchronous communications protocol. 
With the serial port hardware configured to operate 
at 115 Kbaud, PC-CrOS is able to achieve an effective 
data transmission rate of approximately 47 Kbaud. 

PC-MOS 

MOS is an interrupt-driven communication sys- 
tem which provides the nodes of a hypercube the ca- 
pability of performing message passing between nodes 
that ate not physically linked by a channel. When a 
packet arrives, the processor is interrupted and the 
packet is read. If a packet for another processor ar- 
rives, it will be forwarded without any effect on the 




application program. In other words, message trans- 
mission and reception can proceed concurrently with 
the exerutinn the application program Since pro- 
cessors are interrupted for read and write, there is no 
requirement of synchronisation between the commu- 
nicating processors, i.e., the processors can run asyn- 
chronously. MOS also proTides synchronous mode of 
communication similar to CrOS. 

On a PC-Cube, CrOS communications are typ- 
ically twice as fast as MOS communications. 

PC-Cubix 

Cubix is a model of programming a hypercube 
without programming the control processor. It trans- 
parently provides the functionality of I/O to the node 
program. One of the features of Cubix is that a prop- 
erly written Cubix program for hypercube computers 
can be compiled and executed with no changes on 
a sequential machine. The only requirement is that 
an appropriate Cubix library exists on the sequential 
computer. 

PC-Cubix implements a subset of the Cubix li- 
brary. Unix functions such as getgid, tty name, set- 
gid, getlogin are not supported. Additional functions 
such as mkdir, rmdir, ungetch, getch, putch, kbhit 
are available for PC-Cubix. 

It also allows I/O operations to be performed on 
the nodes. For low-level I/O, this is done by intro- 
ducing another set of low-level routines that operate 
on the nodes instead of on the CP. For stream I/O, 
locality is just another stream attribute (like singu- 
lar/multiple.) Thus, the same stream I/O routines 
can be used to manipulate both local and CP file 
streams. Local I/O provides useful information for 
debugging concurrent programs. 

PC-Plotix 

Ploiix is a simple graphical system for the hy- 
percube. It runs under Cubix and Unix. It is an 
extension to Cubix which allows node programs to 
draw graphics on the CP in a portable manner. 

PC-Plotix is an implementation of Plotix. PC- 
Plotix does not support functions such as polygon fill 
and certain line types. It supports both CGA and 
EGA. 

CrOS Performance Monitor 

CrOS Performance Monitor is a facility built 
into PC-CrOS. It allows the measurement of how 
much time a program or program segment spends 
on computation, communication, idling - waiting for 


communication to begin (usually cnmmunication and 
idle time is lumped together as communication over- 
head), and on performing file LO from inside of a 
CrOS function. 

A user can at any time turn data collection on 
or off, retrieve the current statistics, or prints a sum- 
mary of profiling statistics to a file stream. The CrOS 
Performance Monitor has a real-time mode. When 
real-time mode is turned on, this facility will contin- 
uously display on each node both the current state 
of CrOS (computing, sending, receive, or waiting on 
a particular channel) and the timing data, presented 
in a graphical form. The performance monitor pro- 
vides information about load- balancing of various al- 
gorithm implementations and indicates inefficiencies 
of different decomposition strategies. 

Execution Profiler 

While the CrOS Performance Monitor keeps 
track of profiling statistics for CrOS-specific func- 
tions, the execution profiler will work on parallel and 
sequential programs. Programs are large, complex 
systems. A personal computer executes hundreds of 
thousands of instructions per second. The execution 
profiler is a tool which samples the instruction pointer 
(IP) of a PC at fixed time interval and gives a mea- 
sure of how a program or program segment spends its 
execution time in a statistical sense. 

The function init-prof() opens a file for stor- 
ing profiling data, and cnd.prof() closes the file. The 
start.profO stop.prof() functions turn profiling 
on and off. A special feature of this execution pro- 
filer is that it can accumulate profiling statistics fol- 
lowing the control flow of a program instead of its 
structure by calling start . 3 pecial( ) and stop jipecial( ) 
in the program. Two kinds of report can be gener- 
ated: a symbol or a line report. 

Users can use the execution profiler to fine-tune 
their programs for better performance. 

GRAFIX 

It is a collection of simple graphics primitives as 
well as functions to emulate some of the commercially 
available HALO graphics library. PC-Plotix uses the 
Grafix library to perform the actual graphics opera- 
tions. PC-Cube programs using functions provided 
directly by Grafix or any other PC graphics libraries 
are not portable to other commercial hypercubes. 

7. Experience with Code Compatibili ty 

Several application programs were taken from 
the Caltcch/JPL Mark Series hypercube and ported 
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to a PC-Cubc. The algorithms used in these pro- 
grams arc described in detail in the book Solving 
Problems on Concurrent Computers These 
programs range from solving the 1-D wave equation 
for a simple vibrating string, 2-D Laplace equation 
in rectangular coordination using finite difference ap> 
proximation, a 3-D simulation of the dynamics of a 
number of particles governed by an attractive long 
range force such as gravity, to an ecological simula- 
tion of sharks and hsh on the toroidal planet Wa-Tor 
[Dewdney 85|. Also, a Mandelbrot set ;.olver [Dewd- 
ney 84j to explore the Julia curve was written. 

Except for the graphics enhancement the PC- 
Cube version of these application programs are ex- 


actly the same as the implementations the Mark 
Series Hypercubes, thus demonstrating software com 
patibility of PC-Cube with current CrOS III, Mrr 
cury, and Cubix programs. 

8, Efficiency of PC-Cube 

Although PC-Cube is designed to be an educa- 
tional tool for parallel processing and not for high- 
performance, it provides performance speed up rel- 
ative to the node PCs. Figures 2 to 5 illustrate 
the timing and elRciencies of FC-Cube applications 
using both PC-CrOS and PC-MOS communications 
to implement a Laplace Equation Solver on a 2- 
dimensional grid. 


Timing For CrOS Lapteco Dome 



Ry. 2 

Shows the decreasing amowc of time needed to compute 100 
iterations of updating on a 160x80 grid when m mtxeaang 
number of procesaon are used. In this case the problem is 
solving the 2-D Laplace equation. The opeming system used to 
get this data was CM>S m. 


Efficlancy Data For CrOS Laplac# Oamo 



Shows the efficiency ( * speeckjp/# of nodes) of computing 100 
iterations of updating on a 160x80 grid and on a 40x20 grid for 
the Laplace demo. Note that the efficiency is better for the 
160x80 grid problem hecatise the communications overtead is 
smaller so each node spends a larger proportion of its time on 
compuuuioa The openting syaem used to get this wu 

CiOS ni. 


Timing For Mareury Laplaea Damo 



Shows the decreasing amount of time needed to compute 100 
iterations of updating on a 80x80 grid when wi increasing 
number of processors are used. In this case the problem is 
solving the 2-D Laplace equation. The operating system used to 
get this data wu Mercury. 


Efficlancy Data for Mareury Laplaea Damo 



i of nodee 

Fts. 5 

Show, Uw efficiency ( - upeedup/# of nodet) of oompuOng 100 
iteraiioni of up<’«ing on « 80x80 grid for the Uplace demo. 
Note that the efficiency decreaaei as the number of processors 
increases. This is due to the increasing amotait of 
communication needed with more processors. The operating 
system used to get this data was Mercury 






It is indicalrd in the figures that the finer the 
grid, the less the communication overhead relative to 
the computational load. Thus, the higher the effi- 
ciency. The CrOS version of Laplace Solver achieves 
about 75% efficiency with 8 nodes working on a 160 
• 80 grid. 

9. Conclusions 

PC-CUBE is an inexpensive and easy to install 
hypcicube system. It is an indispensable tool for 
learning hypercube programming and parallel pro- 
cessing in general. The capability of nodal text and 
graphics output is particularly important for debug- 
ging purposes as well as providing insights into the 
physical problem at hand. The performance mon- 
itor aids the development of efficient load-balanced 
concurrent codes for beginners and experienced pro- 
grammers. The execution profiler helps users to fine- 
tune their programs, both sequential and parallel, for 
higher performance. Last but not least, concurrent 
programs that are written for the PC-CUBE is up- 
ward compatible. The same piece of code can be 
st taught- forwardly ported to the larger commercial 
hypercube computers with little or even no modifi- 
cation. 


10, Appendix 

PC-eXLISP (concurrent XLISP) IHo 88j, a par- 
allel version of the public domain software XLISP - an 
experimental LISP interpreter, has been implemented 
on PC-Cubc. CXLISP adds M ercury communication 
functions to the original XLISP interpreter. 

When CXLISP is executed on PC-Cube, the 
CP prints the CXLISP startup message and down- 
loads the CXLISP node program. After the down- 
load process completes, the CP enters the read-eval- 
print loop and wait for input from the keyboard. The 
nodes, each running an XLISP interpreter, also print 
the CXLISP startup message and enter the rcad-eval- 
print loop; however, they do not read input from key- 
boards. Rather, they wait for a Mercury message 
from other nodes or from the CP. 

There is also an alternate version of CXLISP 
node program which takes input from keyboards at 
the nodes instead of Mercury messages. This version 
is useful for debugging. 

CXLISP is not distributed with the PC-Cube 
package because the original author of XLISP has not 
yet been contacted in this regard. 


(Dewdney 84) 
[Dewdney 85] 
[Flower 86) 

[Fox 88] 


(Ho 88) 

[Ho 88b] 
[Ho 88c] 

[Kolawa 86] 
[Lee 86] 

[Salmon 86] 


References 

A.K. Dewdney, Sctenitfic America, Vol 251, #6, pp. 16-24, Dec. 1984. 

A.K. Dewdney, Scteniific America, Voi 25S, #;?, pp. 18-23, August 1985. 

Jon Flower and Roy Williams, "Plotix - A Graphical System to Run Cubix end Unix", Caltech 
report C^P 285, 1986. 

Geoffrey C, Pox, Mark A. Johnson, Gregory A. Lyicnga, Steve W. Otto, John K. Salmon, 
and David W. Walker, “Solving Problems on Concurrent Processors", pub. Picnticc-Hall, 
Eagle wood Cliffs, New Jersey, 1988. 

Alex W. Ho, Scott Snyder and Douglas Chang, “User’s Guide for PC-Cube, The IBM PC-based 
Hypercube”, Caltech report C^P 563, 1988. 

Ho et al., “MAC-Cube, A Macintosh-based Hypercube,” this volume, 1988. 

Alex W. Ho and Scott Snyder, “CXLISP - A Concurrent XLISP Interpreter on t,he Hypercube", 
Caltech report C^P 559, 1988. 

Adam Kolawa and Barbara Zimmerman, “CtOS III Manual”, Caltech report C^P 253B, 1986. 
Roger Lee, “Mercury I/O Library User’s Guide, C Language Edition", Caltech report C^P 
301, 1986. 

John Salmon, “Cubix: An I/O System for the Hypercube”, Caltech report C^P 285, 1986, 


DISCLAIMER 


This report was prepared as an account of work sponsored by an agency of the United States 
Government. Neither the United States Government nor any agency thereof, nor any of their 
cmp oyecs makes any warranty, express or implied, or assumes any legal liability or responsi- 
bility for the accuracy, completeness, or usefulness of any information, apparatus, produ^U or 
procc^ disclosed, or represents that its use would not infringe privately owned rights. Refer- 
cnee herein to any s^ific commercial product, process, or service by trade name, trademark 
manufacturer, or otherwise docs not necessarily constitute or imply its endorsement, rccom- 
cndation. or favoring by the United States Government or any agency thereof. The views 

n expressed herein do not necessarily state or rcHect those of the 

United States Government or any agency thereof. 


